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THE  LEAST  SQUARES  FIT  OF  AN  ALGEBRAICALLY  UNSPECIFIED  FORM 

There  are  occasions  when  an  accurate  yet  simple  algebraic 
description  of  an  established  graphic  form  cannot  readily 
be  devised.   Under  such  circumstances,  the  graphic  form 
itself  may  be  used  as  an  independent  variate  transform  to  be 
scaled  to  any  data-set  Y.  by  Least  Squares. 

Consider  the  case  where  a  sensibly  smooth  curve  has  been 
fitted  by  "eye"-^  through  a  set  of  plotted  data  points  and  it 
is  desired  to  fit  this  curve  through  the  same  data  or  new 
data  by  the  Least  Squares  Fitting  System.   One  may  proceed  as 

follows : 

Let  a  two-dimensional,  graphed  curve  be  based  on,  say,  30 
paired  values  of  Y  and  X,  dependent  and  independent  variates 
respectively.   For  each  of  the  30  X-values,  read  the  corres- 
ponding graphic-curve  value  of  Y.   Call  these  values  X,^  and 
let  them  take  the  place  of  the  usual  algebraic  transform(s) 
of  X,  to  be  scaled  to  the  Y.  by  Least  Squares.   Then,  fit 
the  model 


Y  =  a  +  b 


\ 


through  the  Y.  using  the  30  paired  values  of  Y  and  X^,  where 
\2l'   will  be  close  to  1.00  if  the  graphic  fit  approaches  the 
Least  Squares  fit,  and  a  will  be  close  to  zero. 

The  total  sum  of  squares  of  deviations  of  Y.  from  Y  is 

"total  ZXi         -4— 
The  sum  of  squares  attributable  to  regression  is 

SS     =  b  Exy        where:   b  =  Zxy/Zx 

reg-  Zxy  =  Z3LY  -  ZX^Y/n 

zx2  =  z4  -  OV2/* 


1/  i.e.:   An  approximate  Least  Deviations  Fitting  System. 
2/    denotes  an  estimated  quantity. 


And,  the  sum  of  squared  deviations  about  the  form  which  is 
fit  through  the  data  is 

SS     =  SS    -,  -  SS 
res.     total     reg. 

R2  =  SS    /SS 

reg.    total 


A  test  of  significance  for  regression  would  be 

SS     n         _   MS. 

bbres./n-2 


l.n-2 


«- 


error 


but  is  subject  to  question  when  the  X_-form  is  not  derived 
independently  of  the  data  through  which  it  is  being  fit  by 
Least  Squares. 

EXAMPLE 

A  sawmill  study  of  100  boards  provided  an  array  of  frequen- 
cies by  board- thickness  groups,  to  which  the  fitting  method 
described  herein  was  applied  —  see  the  graphs  and  computations 
that  follow. 
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LEAST  SQUARES  FIT  OF  ORIGINAL,  SMOOTHED 


FORM  TO  THE  Y.  Y  =  .788  +  .9272 
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£X^ =   2,214.38 

(ZX^/n  =    (100.2)2/9   =   1,115.56 

Zx2    =   1,098.82 

ar^y =   2,132.20 

ZX^Y/n  =    (100.2)(l00)/9   =   1,113.33 
2xy   I! =   1,018.87 

b  =  Zxy/Zx2  =  1,018.87/1,098.82  =  .92724 
a  =  Y  -  biXy)   =  100/9  -  .92724(11.1333)  =  .788 
Then—-      Y  =  .788  +  .9272  2C, 


And 2Y2 


=  2,104.0 

(ZY)2/n  =  (100)2/9  =  1,111.1 
Zy2   --- =   992.9 

SS_  _  -  =  Zy2  =  992.9 
total    J 

SS     =  b  Zxy  =  .92724  (1,018.87)  =  944.7 
reg.       J 

SS     =  992.9  -  944.7  =  48.2 
res . 

R2  =  944.7/992.9  =  .95 

Had  the  curve  form  adopted  been  developed  from  a  prior  set 
of  related  data,  or  from  hypothesis,  tests  of  regression 
significance  would  be  appropriate,  as  follows: 

F   /q  9\  =  944.7   =  137  --  which,  of  course,  is  significant 
1,'9~2^   48.2/7  beyond  the  1  percent  level. 
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To  obtain  Y  for  other  than  the  original  30  X-values,  either 
return  to  the  original  graph  for  corresponding  X^   and  solve 
for  Y^in  the  derived  form  Y  =  a  +  bX  ;  or  perhaps  more  simply, 
plot  Y  for,  say,  10  values  of  X  across  the  range  of  X.   Then 
join  such  points  by  straight  lines  or  suitably  smoothed 
curves,  and  read  Y.  directly  from  the  graph. 

To  fit  the  original,  smoothed  form  to  a  new  set  of  (X,Y)- 
values,  simply  read  the  X,^  for  the  new  X-values  and  proceed 
as  above. 

The  fitting  system  is  analogous  for  cases  involving  more 

than  two  dimensions  and  permits  the  contribution  of  one  or 

more  variates  to  SS     in  any  fitting  order  to  be  evaluated 

re  b 
when  alternative  regressions  are  graphed  and  then  fitted  as 

above . 

EXTREME  CAUTION  is  urged  in  developing  the  "sensibly 
smoothed  curve,"  which  is  assumed  for  the  fitting  process 
described  here.   Before  looking  at  your  data,  construct  an 
expected  response  form  covering  the  extremes  of  the  independ- 
ent variate.   Usually  you  can  be  quite  certain  as  to  the 
slope  and  value  of  the  curve  at  such  extremes,  e.g.  flat, 
tending  to  zero--or  vertical,  tending  to  infinity.   Then, 
what  happens  to  the  curve  between  the  extremes  becomes  self- 
evident—see  the  dotted  lines  in  the  graphed  examples  below: 
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Points  in  the  range  of  the  independent  variate  at  which 
important  changes  in  slope  occur  are  sometimes  predictable 
from  past  experience.   Exert  every  effort  to  bring  known 
information  to  bear  on  this  estimate--even  quantification 
where  possible.   Then  observe  the  shape  of  the  expected 
curve  in  the  range  of  the  independent  variate  relevant  to 
your  data.   Fit  a  similar  curve  through  your  data  points, 
allowing  the  data  to  guide  you  as  to  slope  and  points  of 
critical  slope  change.   The  resulting  form  constitutes  a 
"sensibly  smoothed  curve"  for  the  purposes  of  this  paper. 

Chester  E.  Jensen,  statistician 

Central  States  Forest  Experiment  Station 
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